In this paper, we present a study on sample preselection in large trainingdata set for CNN-based classification. To do so, we structure the input dataset in a network representation, namely the Relative Neighbourhood Graph, andthen extract some vectors of interest. The proposed preselection method isevaluated in the context of handwritten character recognition, by using twodata sets, up to several hundred thousands of images. It is shown that thegraph-based preselection can reduce the training data set without degrading therecognition accuracy of a non pretrained CNN shallow model.
展开▼